String operations

In computer science, in the area of formal language theory, frequent use is made of a variety of string functions; however, the notation used is different from that used on computer programming, and some commonly used functions in the theoretical realm are rarely used when programming. This article defines some of these basic terms.

1 Alphabet of a string
2 String substitution
3 String homomorphism
4 String projection
5 Right quotient
6 Syntactic relation
7 Right cancellation
8 Prefixes
9 See also
10 References

Alphabet of a string

The alphabet of a string is a list of all of the letters that occur in a particular string. If s is a string, its alphabet is denoted by

$\operatorname{Alph}(s)$

String substitution

Let L be a language, and let $\Sigma$ be its alphabet. A string substitution or simply a substitution is a mapping f that maps letters in $\Sigma$ to languages (possibly in a different alphabet). Thus, for example, given a letter $a\in \Sigma$ , one has $f(a)=L_a$ where $L_a\subset\Delta^*$ is some language whose alphabet is $\Delta$ . This mapping may be extended to strings as

$f(\varepsilon)=\varepsilon$

for the empty string $\varepsilon$ , and

$f(sa)=f(s)f(a)$

for string $s\in L$ . String substitution may be extended to the entire language as

$f(L)=\bigcup_{s\in L} f(s)$

An example of string substitution occurs in regular languages, which are closed under string substitution. That is, if the letters of a regular language are substituted by other regular languages, the result is still a regular language.

Another example is the conversion of an EBCDIC-encoded string to ASCII.

String homomorphism

A string homomorphism (often referred to simply as a homomorphism in formal language theory) is a string substitution such that each letter is replaced by a single string. That is, $f(a)=s$ , where s is a string, for each letter a. String homomorphisms are homomorphisms, preserving the binary operation of string concatenation. Given a language L, the set $f(L)$ is called the homomorphic image of L. The inverse homomorphic image of a string s is defined as

$f^{-1}(s)=\{w\vert f(w)=s\}$

while the inverse homomorphic image of a language L is defined as

$f^{-1}(L)=\{s\vert f(s)\in L\}$

Note that, in general, $f(f^{-1}(L))\ne L$ , while one does have

$f(f^{-1}(L)) \subseteq L$

and

$L \subseteq f^{-1}(f(L))$

for any language L.

A string homomorphism is said to be $\epsilon$ -free (or e-free) if $f(a) \ne \epsilon$ for all $a$ in $\Sigma$ . Simple single-letter substitution ciphers are examples of ( $\epsilon$ -free) string homomorphisms.

String projection

If s is a string, and $\Sigma$ is an alphabet, the string projection of s is the string that results by removing all letters which are not in $\Sigma$ . It is written as $\pi_\Sigma(s)\,$ . It is formally defined by removal of letters from the right hand side:

$\pi_\Sigma(s) = \begin{cases} \varepsilon & \mbox{if } s=\varepsilon \mbox{ the empty string} \\ \pi_\Sigma(t) & \mbox{if } s=ta \mbox{ and } a \notin \Sigma \\ \pi_\Sigma(t)a & \mbox{if } s=ta \mbox{ and } a \in \Sigma \end{cases}$

Here $\varepsilon$ denotes the empty string. The projection of a string is essentially the same as a projection in relational algebra.

String projection may be promoted to the projection of a language. Given a formal language L, its projection is given by

$\pi_\Sigma (L)=\{\pi_\Sigma(s) \vert s\in L \}$

Right quotient

The right quotient of a letter a from a string s is the truncation of the letter a in the string s, from the right hand side. It is denoted as $s/a$ . If the string does not have a on the right hand side, the result is the empty string. Thus:

$(sa)/ b = \begin{cases} s & \mbox{if } a=b \\ \varepsilon & \mbox{if } a \ne b \end{cases}$

The quotient of the empty string may be taken:

$\varepsilon / a = \varepsilon$

Similarly, given a subset $S\subset M$ of a monoid $M$ , one may define the quotient subset as

$S/a=\{s\in M \vert sa\in S\}$

Left quotients may be defined similarly, with operations taking place on the left of a string.

Syntactic relation

The right quotient of a subset $S\subset M$ of a monoid $M$ defines an equivalence relation, called the right syntactic relation of S. It is given by

$\sim_S \;\,=\, \{(s,t)\in M\times M \vert S/s = S/t \}$

The relation is clearly of finite index (has a finite number of equivalence classes) if and only if the family right quotients is finite; that is, if

$\{S/m \vert m\in M\}$

is finite. In this case, S is a recognizable language, that is, a language that can be recognized by a finite state automaton. This is discussed in greater detail in the article on syntactic monoids.

Right cancellation

The right cancellation of a letter a from a string s is the removal of the first occurrence of the letter a in the string s, starting from the right hand side. It is denoted as $s\div a$ and is recursively defined as

$(sa)\div b = \begin{cases} s & \mbox{if } a=b \\ (s\div b)a & \mbox{if } a \ne b \end{cases}$

The empty string is always cancellable:

$\varepsilon \div a = \varepsilon$

Clearly, right cancellation and projection commute:

$\pi_\Sigma(s)\div a = \pi_\Sigma(s \div a )$

Prefixes

The prefixes of a string is the set of all prefixes to a string, with respect to a given language:

$\operatorname{Pref}_L(s) = \{t \vert s=tu \mbox { for } t,u\in \operatorname{Alph}(L)^*\}$

here $s\in L$ .

The prefix closure of a language is

$\operatorname{Pref} (L) = \bigcup_{s\in L} \operatorname{Pref}_L(s) = \left\{ t\vert s=tu; s\in L; t,u\in \operatorname{Alph}(L)^* \right\}$

Example:
$L=\left\{abc\right\}\mbox{ then } \operatorname{Pref}(L)=\left\{\varepsilon, a, ab, abc\right\}$

A language is called prefix closed if $\operatorname{Pref} (L) = L$ .

The prefix closure operator is idempotent:

$\operatorname{Pref} (\operatorname{Pref} (L)) =\operatorname{Pref} (L)$

The prefix relation is a binary relation $\sqsubseteq$ such that $s\sqsubseteq t$ if and only if $s \in \operatorname{Pref}_L(t)$ .

References

John E. Hopcroft and Jeffrey D. Ullman, Introduction to Automata Theory, Languages and Computation, Addison-Wesley Publishing, Reading Massachusetts, 1979. ISBN 0-201-02988-X. (See chapter 3.)